by DL Keeshin
November 13, 2024
Previously, I described how I created a process for sending answers back to the database. Today I am going to address (pun intended) location data using Google Maps Geocoding API as well as using ChatGPT’s OpenAI API to assign a job role derived from a job title and job description.
The design approach for managing location data in the kDS Interview App Database uses a surrogate key—a unique hash derived from latitude and longitude coordinates—to uniquely identify each location. This hash-based identifier ensures that each geographic point is stored only once, reducing redundancy and improving lookup efficiency.
reference.location
TableLocation data is stored in the reference.location
table. Its' structure is designed to store not only latitude and longitude but also details like address, city, and country. The location_hash_id
is a primary key and unique constraint generated from the latitude and longitude values, preventing duplicate entries for the same geographic point.
CREATE TABLE IF NOT EXISTS reference.location (
location_hash_id character varying(32), -- Unique identifier based on lat/lon hash
latitude_longitude point NOT NULL, -- Stores coordinates as a point (x=longitude, y=latitude)
description_ character varying(64), -- Descriptive label for the location
address_1 character varying(72), -- First line of address
address_2 character varying(72), -- Second line of address, if applicable
city_ character varying(72), -- City of the location
province_state character varying(32), -- State or province
postal_code character varying(10), -- ZIP or postal code
country_code character varying(4), -- ISO country code
source_ character varying(72), -- Source or origin of location data
effective_start_date date, -- Date when this location entry became valid
effective_end_date date, -- Date when this location entry ended, if applicable
CONSTRAINT location_pkey PRIMARY KEY (location_hash_id), -- Primary key
CONSTRAINT location_hash_id_unique UNIQUE (location_hash_id) -- Enforces unique constraint
);
The Python code below integrates with the Google Maps Geocoding API to retrieve latitude and longitude from an address, generate a unique hash from these values, and store it in PostgreSQL.
reference.location
table in PostgreSQL. The ON CONFLICT
clause allows the code to skip duplicate inserts if a location with the same location_hash_id
already exists.Note: To run the following code, you’ll need an API key with access to the Geocoding API. Sign up on the Google Cloud Platform and enable billing to obtain one. Google Maps charges $0.005 per geocoding request after the first 40,000 free requests each month.
import os
import json
import requests
import hashlib
import psycopg2
from datetime import date
from dotenv import load_dotenv
# Load environment variables
load_dotenv()
# Database connection variables
api_key = os.environ.get('GEOCODEAPIKEY')
# Connect to PostgreSQL database
def connect_db():
return psycopg2.connect(
dbname=os.getenv("DBSURVEY"),
user=os.getenv("DBUSER"),
password=os.getenv("DBPASSWORD"),
host=os.getenv("DBHOST"),
port=os.getenv("DBPORT")
)
# Function to get coordinates from the Google Maps API
def get_coordinates(address, api_key):
base_url = "https://maps.googleapis.com/maps/api/geocode/json"
params = {"address": address, "key": api_key}
response = requests.get(base_url, params=params)
data = response.json()
if data["status"] == "OK":
location = data["results"][0]["geometry"]["location"]
latitude = location["lat"]
longitude = location["lng"]
return latitude, longitude
else:
print("Error:", data["status"])
return None
# Function to hash latitude and longitude into a unique ID
def generate_location_hash(latitude, longitude):
location_string = f"{latitude:.5f},{longitude:.5f}"
return hashlib.md5(location_string.encode()).hexdigest()
# Function to insert location into the PostgreSQL table
def insert_location(db_conn, latitude, longitude, address, description, postal_code, country_code, source):
# Generate location hash and formatted data
location_hash_id = generate_location_hash(latitude, longitude)
point_value = f"({longitude}, {latitude})" # PostgreSQL expects (x, y) format for point type
effective_start_date = date.today()
# SQL Insert statement
query = """
INSERT INTO reference.location (
location_hash_id, latitude_longitude, description_, address_1, city_, province_state, postal_code,
country_code, source_, effective_start_date
)
VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s)
ON CONFLICT (location_hash_id) DO NOTHING;
"""
data = (
location_hash_id, point_value, description, address, city, province_state, postal_code,
country_code, source, effective_start_date
)
# Insert into the table
with db_conn.cursor() as cursor:
cursor.execute(query, data)
db_conn.commit()
print(f"Inserted location: {description} at {latitude}, {longitude}")
# Main function to obtain coordinates and store them
def store_address_location(address, description, postal_code, country_code, source):
# Step 1: Retrieve coordinates
coordinates = get_coordinates(address, api_key)
if coordinates:
latitude, longitude = coordinates
# Step 2: Insert data into the database
db_conn = connect_db()
insert_location(db_conn, latitude, longitude, address, description, postal_code, country_code, source)
db_conn.close()
# Example usage
address = "5700 S Dusable Lk Shr Drive" #Museum of Science and Industry - Chicago"
description = "Museum of Science and Industry - Chicago"
postal_code = "60637"
country_code = "US"
city = "Chicago"
province_state = "Illinois"
source = "Google Maps API"
store_address_location(address, description, postal_code, country_code, source)
A job role is simply a clear label that captures what a person does in a position. In the kds discovery app this role is part of a unique key, made up of industry, business unit, and job role, which the app uses to tailor interview questions to specific roles. OpenAI’s ChatGPT API helps by taking job title and description data and deriving concise job roles from it.
To see how this works, we first need to create a new table for storing role data,
CREATE TABLE stage.role (
email_ varchar(96) NOT NULL, -- Email of the user associated with the role
job_title varchar(255) NOT NULL, -- Original job title
job_role varchar(128) NULL, -- Generated role name, standardized
job_role_detail JSONB NOT NULL, -- Detailed role description in JSONB format
create_date date NOT NULL, -- Creation date of the record
created_by varchar(96) NOT NULL, -- Creator or source, e.g., email
source_ varchar(96) NOT NULL, -- Source of data generation, e.g., "ChatGPT"
CONSTRAINT PK_role PRIMARY KEY (email_)
);
Next,the following Python code uses OpenAI's GPT-4 model to generate standardized job role derived from a job title and job description.
stage.role
, using a stored procedure to handle complex JSONB data.Note: to run the following code you'll need an OpenAI API key. To get one, sign up or log in at OpenAI’s website, navigate to API keys in the Account section, and generate a new key. OpenAI charges based on tokens (input and output text). GPT-4 models typically cost fractions of a cent per request, with longer prompts and responses increasing the token usage and cost. To date, OpenAI access charges for unit testing this Python script was 30 cents ($.30).
import os
import openai
import json
import psycopg2
from dotenv import load_dotenv
from datetime import date
import re
# Load environment variables from .env file
load_dotenv()
openai.api_key = os.environ.get('OPENAPIKEY')
# Database connection details from environment variables
db_params = {
"dbname": os.getenv("DBSURVEY"),
"user": os.getenv("DBUSER"),
"password": os.getenv("DBPASSWORD"),
"host": os.getenv("DBHOST"),
"port": os.getenv("DBPORT")
}
# Function to generate prompt for job role
def generate_role_prompt(job_title, job_description):
prompt = (
f"Based on the following job title and description, please assign a more concise job title and provide a two-sentence role description.\n\n"
f"Job Title: {job_title}\n"
f"Description: {job_description}\n\n"
f"Respond in JSON format as follows:\n"
f"{{\n"
f" \"role\": \"\",\n"
f" \"description\": \"\",\n"
f" \"rationale\": \"\"\n"
f"}}"
)
return prompt
# Function to interact with GPT-4 and parse the JSON response
def get_role_code_from_gpt(job_title, job_description):
prompt = generate_role_prompt(job_title, job_description)
response = openai.ChatCompletion.create(
model="gpt-4-turbo",
messages=[{"role": "user", "content": prompt}]
)
# Clean up and parse JSON response
role_response = response['choices'][0]['message']['content'].strip()
role_response = re.sub(r"```json\n|```", "", role_response).strip()
try:
role_data = json.loads(role_response)
except json.JSONDecodeError as e:
print("Error parsing JSON:", e)
print("Response content was:", role_response) # Debugging line
raise ValueError("Failed to decode JSON from response.")
return role_data
# Connect to PostgreSQL database
conn = psycopg2.connect(**db_params)
cur = conn.cursor()
# Generate role data from GPT-4
try:
role_data = get_role_code_from_gpt(job_title, job_description)
except ValueError as e:
print("Error generating role data:", e)
conn.close()
exit()
# Define metadata for insertion
source_ = "ChatGPT"
job_role_detail = json.dumps(role_data) # Convert role_data to JSON for JSONB column
# Insert into PostgreSQL table using a stored procedure
try:
cur.execute(
"""
CALL stage.up_insert_role(%s, %s, %s, %s, %s, %s, %s);
""",
(
email,
job_title,
role_data.get("role"),
job_role_detail, # JSONB job role detail
date.today(),
email, # created_by
source_
)
)
# Commit changes
conn.commit()
print("Database populated with new concise job role and rationale.")
except Exception as e:
print("Error inserting data:", e)
conn.rollback()
finally:
cur.close()
conn.close()
Here are testing results from the above script showing each role_ that OpenAI API/ChatGPT generates from the user provided job title and job description ,
job_title: Director, Enterprise Architecture & Applications Group, Information Systems & Technology Services job_description: Oversees the development and implementation of enterprise IT architecture and applications. Ensures alignment between IT strategies and business objectives. role_: Director of IT Architecture rationale: Simplified title for clarity and emphasis on key responsibilities job_title: Principal Group Engineering Manager, Cloud and Enterprise Security job_description: Oversees the development and maintenance of security infrastructure for cloud and enterprise systems, making strategic architectural decisions and managing diverse engineering teams focused on security. role_: Chief Security Engineering Manager rationale: Simplifies original title while emphasizing leadership and scope of responsibility in security engineering. job_title: Senior Vice President, Platform Product Management and Developer Relations job_description: Responsible for guiding the development and strategy of the company's software platforms, while managing relationships and interactions with external developers. Ensures effective adoption and usage of platform tools by the developer community. role_:Head of Platform Strategy & Developer Relations rationale:The revised title concisely reflects leadership over both product strategy and community engagement.
This post explored how to handle location data and job roles within the kDS Interview App Database using external APIs. By leveraging Google Maps for precise geolocation and OpenAI’s ChatGPT API for standardizing job titles, we can efficiently store unique identifiers for locations and clarify job roles. This setup ensures a scalable, consistent data model, helping tailor interview questions to users' unique job roles and locations, ultimately enhancing data discovery and personalization in the app.
For more information about running and installing scripts described in this post, please see the section called "Database & APP Installation Notes" in the github README file here.
Thanks for stopping by.